Presentation: Tweet"KEYNOTE: A Monadic Model for Big Data"
Since its inception in 1970, Codd's relational model has fueled a multi-billion dollar industry that after 40 years still experiences double digit growth. Because the relational database and SQL are based on strong mathematical foundations of sets and relations, complementary producers such as educators, tool vendors, consultants, etc., can all target the same underlying conceptual model, creating a strong ecosystem of users and domain experts.
If we look at the relational model through our developer eyes, we notice that the relational algebra is one particular implementation of a more general interface, an interface that mathematicians call monads. By generalizing from relations to arbitrary monads, we are able to query many different kinds of data using a single query language, in particular we can formulate queries over data of any size, finite or infinite. Similarly, when we use another programmer’s trick to swap around fk/pk relationships between flat rows into pointers between nested structures, we can query both relational and pointer-based data, which includes documents, graphs, using a single monadic algebra of query operators. Lastly, by leveraging the mathematical trick of duality, we can implement our monadic interface over both push- as well as pull-based data.
By generalizing from sets and relations to monads we have created a three-dimensional design space for data spanned by the dimensions of Volume, Variety, and Velocity, together with a set of monadic standard query operators. Looking at data in the context of this “cube” we can categorize many data sources according to these three elementary dimensions. For example, a mouse is a database whose (a) volume is infinite, (b) whose variety is flat, and (c) whose velocity is push, or the typical document database is (a) finite, (b) nested, and (c) pull-based, etc. In other words, monads provide a mathematical and practical basis for what the industry nowadays calls “big data”.
In this talk we will explain how any programmer could have invented this unified model of big data herself, and perhaps even more importantly, how any modern programming language allows you to use these principles to simplify your day to day data programmability problems.